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1. Introduction. This (tutorial) paper grew out of the need to motivate the usual formulation of a "Total Least 
Squares problem" and to explain the way it is solved using the "Singular Value Decomposition". Although it is 
an important generalization of (ordinary) least squares and not more difficult to understand, it is hardly treated 
in numerical textbooks up to now. In the well-known book of Golub & Van Loan Q and in the problem is 
formulated as follows: 



It is proposed as a more natural way to approximate the data if both A and b are contaminated by "errors" . In our 
opinion, it is not made clear sufficiently well, why this indeed is a natural generalization of the standard least squares 
problem and why it makes sense to study it. On the other hand, the classroom note of Y. Nievergelt || gives a very 
nice introduction, but it tells only half of the story in that it considers (multiple) regression only. 

In this note, we shall give a unified view of ordinary and total least squares problems and their solution. As the 
geometry underlying the problem setting greatly contributes to the understanding of the solution, we shall introduce 
least squares problems and their generalization via interpretations in both column space and (the dual) row space 
and we shall use both approaches to clarify the solution. After a study of the least squares approximation for simple 
regression in section^ we introduce the notion of approximation in the sense of "Total Least Squares (TLS)" for this 
problem in section ^. In the next section we consider ordinary and total least squares approximations for multiple 
regression problems and in section ^| we study the solution of a general overdetermined system of equations in TLS- 
sense. In a final section we consider generalizations with multiple right-hand sides and with "frozen" columns. We 
remark that a TLS-approximation needs not exist in general; however, the line (or hyperplane) of best approximation 
in TLS-sense for a regression problem does exist always. 

As numerical algorithms such as the QR-factorization and the Singular Value Decomposition (SVD) are relatively 
well-known and nicely implemented in a package like MATLAB, we shall not consider numerical algorithms to compute 
the solutions effectively. 

2. Primal vs. dual approach. To make clear how both column- and row-space arguments can be used to derive 
the solution of a least squares problem, we consider least squares in one dimension: 

Given m points {xi | i = 1, • • • , m}, End z G Ft that minimizes the quadratic functional 



Given a matrix A G lR mxn with m> n and a vector b G R m , 
End residuals E G R mxn and r G R m that minimize 
the Frobenius norm \\(E\t)\\f subject to the condition b + r G Im(A + E). 



(1.1) 



m 




(2.1) 



The function z i— > 



f(z) is a parabola. When we shift its center to the average x := ^ x 



m 



m 



f(z) = (xi - z) 2 = { (xi - x) 2 + 2{x l - x)(x - z) + (x - zf } , 



(2.2) 



we see that the sum of double products vanishes. Hence, the average x is the unique minimizer. 

In the dual approach we consider the data as one point in x G M m . The functional f(z) then measures the square of 
the Euclidean distance to the point ze, 



f(z) = || x- are || 2, where x 



\X m J 



and e 



V i / 



(2.3) 




span{e} 



Fig. 1. Vector x, its orthogonal projection on span{e} and the residual vector x — ze in the 

dual approach. 

From fig. [j], which shows the plane in R m spanned by x and e, we find the orthogonal projection of x on span{e} 
as minimizer, 



x J e I 



e 1 e m 



— Xi 

m. 



(2.4) 



We see that both the primal and the dual approach provide the solution in different ways. In the primal approach 
we use the fact that linear terms vanish by a shift towards the average. In the dual approach we use an orthogonality 
argument. 



3. Simple regression. In the plane M 2 we are given m data points (abscissae and ordinates) 

{{xi , yi) G Ft 2 | i = 1, ■ ■ ■ , m} 



(3.1) 



that should satisfy the linear (affine) relation y(x) = a + bx; find the parameters a and b that provide a "best fit", 
minimizing the sum of squares of the residuals 



f(a , b) := ^ (j/» - a - bxtf 



(3.2) 



We can interpret this as searching the line I := {(x, y) G M 2 | y = a + bx} "nearest" to the datapoints, minimizing 
vertical distances and making the tacit assumption that model errors in the data-model y — a + bx are conSned to 
the observed y-coordinates, as depicted in fig. ^| 

Analogously to (2.2) using the centroid z := (x , y) T = ( — J^^Li Xi > ~ ^2™Li Vi ) T we rewrite / and find as before, 
that the double products vanish, 



(3.3) 



/(a , b) := ^2 (Vi ~ a ~ bxr) 2 = ^2 (v* - V + b ( x i ~ x )) +m(y~a~bx) 2 

i=l i=l 

> ^ (yi -y + b(xj- x)] , V a, b, 



Fig. 2. Simple linear regression; distances are measured along the j/-axis. 



with equality if y = a + bx. This implies that the centroid is located on the line: z G I. Eliminating a it remains to 



minimize a function of b alone, which is a parabola. Hence the minimizer of (3.2) is 



b = and a = y-bx. 

l^i=l V X X i) 

In the dual approach in M m we interpret Xi and yi as components of vectors x and y € W 



(3.4) 



( Xl \ 

X2 

\x m ) 



yi 

V y™ J 



\ i / 



and A := ( e | x ) £ M" 



(3.5) 



In this setting the functional / measures the square of the distance from y to a linear combination of e and x, 



.fi a ; b) = || y — oe — bx || 2 = || y — A t , | L , . 



As in (2.4) it is minimized by the orthogonal projection of y on the span of x and e 



/ minimal y — A [ ~ ) _L lm(A) . 



(3.6) 



(3.7) 



If the rank of A is maximal, the solution can be computed, see from the Normal Equations or better by an 
Orthogonal Factorization 



A T A[ a b \=A T y or better A = QR and J?K)=Q T y. 



Otherwise we can use the Singular Value Decomposition 



I = t' ■ S V T and f a ) = V E T U 1 y . 



(3.8) 



(3.9) 



4. Total Least Squares for simple regression. In (3.2) and fig. ^ we considered the problem of locating a line 
nearest to a collection of points, where the distance is measured along the y-axis. It looks "more natural" to use the 
(shorter) true Euclidean distance instead, as drawn in fig. ^, which yields the line of Total Least Squares. 

So we consider the Total Least Squares problem of finding the line I that minimizes the sum of squares of true 
distances: 



f(£) := dist((xi,yi),£) 2 



(4.1) 



Fig. 3. Line of Total Least Squares: Model errors are distributed over the x- and 

y-coordinates. 



Instead of asking for a line y = ax + b, we use the more symmetric form 

(■ = {(x,y) G R 2 | a + nx + r 2 y = 0} = w + r X , with ||r|| 2 = r\ + r\ = 1, (4.2) 

where w is an arbitrary point on the line £, i.e. a + riwi + r^wi = 0. With this parametrization of £ we accept 
the possibility, that r-2 may become zero, and hence, that the line cannot be recast in the form y = a + f3x. In the 
description I = w + r x , where r is of unit length, the distance from a point z to I is given by, see fig. ^, 

dist(z,£) = |r T (z-w) where t = w + r x = {z G R 2 | r T (z - w) = } and ||r||=l. (4.3) 



«={zgK 2 | (z-w,r) = 0) 



(x-w)| 



O 

Fig. 4. The line £ in the plane is given as the line through the vector w orthogonal to the 
vector r of unit length. For a given vector x the difference vector x — w is drawn together 
with its projection along the line £ and its orthogonal complement. 



Hence the TLS problem is to find r and w that minimize the functional 

m m 

J(r, w) := (r T (zi - w)) 2 = (n (xi - Wi) + r 2 (y, - 1112)) 2 (4.4) 

i=l i=l 

where 

Zi = ( X ' I and r = ( Tl } , ||r|| 2 = r\ + r\ — 1 . 



Making the shift to the centroid, as in (3.3) and (2.2), we find again, that the sum of double products vanishes, 

771 ^ 

7(r,w) = ( rT ( z *~ w )) 

i=l 

7U 771 

= (r T (zi — z)) +^ 2r T (zi — z)r T (z — w) + m(r T (z — w)) 2 

i=l i=l 
= J(r, z) + m(r T (z — w)) 2 > J(r,z). 



(4.5) 



Clearly, the centroid z := (x, y) T minimizes the functional w i— > I(r, w) for every r g J? 2 . This implies, that the 
minimizing line £ = z + r x passes through the centroid (as did the line of simple regression) and that we are left with 
the reduced minimization problem: 
Find the vector r with ||r||2 = 1 minimizing 



m 

7(r,z) = £ (nfai-SO + rafcw-fO) = ||Br|| 2 = r T B T B r , 



(4.6) 



where B € R mx is the matrix 



B := (x-ace | y-ye) 



f xi-x yi-y\ 

X 2 -X J/2 - y 

\ x m — x y m — y / 



(4.7) 



The problem of minimizing || Br ||| subject to || r || 2 = 1 is solved by the Singular Value Decomposition of B, 



B = UT,V 1 with E 



ax 

(72 



and cti > o"2 . 



The solution vector r of (4.6) is the right singular vector of B corresponding to the smaller singular value of B . So 
we conclude: 

a. The solution always exists and is given by the line through the centroid orthogonal to the subdominant singular 
vector of B. 

b. As r-2 can be zero, the solution needs not be expressible in the form y — a + f3x. 

c. The solution is unique iff o\ 7^ 02 • 



d. The shift (1.5) to the centroid z £ i is the key in finding the solution, as shown in 



Fig. 5. Components (fi,gi) are the best approximations of (xi,yi) on the line 

a + nx + r^y = . 



In the dual formulation we consider the vectors x, y and e as in ( |3.5| ) and we describe the line £ as in (4.2) by 
£ ■— {(£,??) j a + n£ + rir\ = 0}. For i = 1 • • • m we denote by (fi,gi) the point on £ nearest to (xi,yi), see fig. ^, 

5 



and by (/,<?) := we denote their average. We define the vectors of first and second components f, 

geM m , 

f : = (/i j h , ■ ■ ■ , /m) T and g := (31 , 32 , ■ ■ ■ , 5m) T . 

These vectors clearly satisfy the relation ae + ri f + g = 0. So we can rephrase the minimization problem ( |4.l| ) as 
the quest for vectors f and g that minimize the sum of squares of distances 

I(a,v);=Y: = r(?i-fi? + HT-M - = II x - f \\l + || y - g f 2 

(4.8) 

subject to a e + n f + r-z g = , rf + r\ = 1. 

Decomposing the vectors in their components in span{e} and in the orthogonal complement e x we obtain 

I(a, r) = || x - f - (x - /)e ||| + || y - g - (y - g)e ||| + m(x - J) 2 + m(y - g) 2 . (4.9) 

The contributions from the parts in span{e} are minimized by the choice / = x and g — y and the subsidiary 
condition implies a + rix + rzy = for that choice. Choosing f := f — a;e and g := g — ye we are left with the 
problem to minimize in e ± the functional: 

|| x — le — f ||| + || y — ye — g ||| subject to n f + J*2 g = . (4-10) 

It is not necessary to impose the condition f , g £ e x , since it is automatically satisfied by the minimizer, because 
x-ie and x — xe satisfy this condition. In matrix notation with B := (x-ie | y — ye) and E := g^ this 
minimization problem takes the form 

minimize || -B — E\\ 2 F subject to rank(E) = 1 . (4-11) 

From the Singular Value Decomposition of B, 

B — o\ Ui Vi + (72 U2 we find E = o\ Ui vf , provided a\ > 02 . 

Hence the total least squares solution is (as before) given by, 

E V2 = implying r = V2 . 



There is a difference in flavour between both approaches. Whereas the primal formulation (4.6) directly produces 
the minimizing vector, the dual approach (4.11) takes a roundabout. The latter provides a minimizing matrix E; 
the parameters of the line are found only afterwards as the coefficients in the linear combination of the columns of E 
that equals zero. 

5. Multiple regression. The extension of ordinary and total least squares to multiple regression is almost straight- 
forward. As most ideas in 2D-regression easily carry over, we can be brief about it. We are given the cloud of m 
datapoints in WL n (each point consisting of an "abscissa" in J?™ -1 and an ordinate in Ft), 

{ Zl : = 1!?/i ) T e R n I i = l,-. ,m}, (5.1) 

that should satisfy the linear (affine) model y{xi ■ ■ ■ x n -\) = Co + C\X\ + C2X2 + ■ ■ ■ + c n _ia; n _r . In ordinary least 
squares the parameters are determined by minimizing the functional J, 

m 

J(c) := ^ (yt - c - ci x[ l> - ■ ■ ■ - Cn-ix^Lj) 2 , c := ( c , ■ ■ • , c n _i ) T . (5.2) 

i=l 

and we can interpret this as the search for the best fitting hyperplane in IR n , 

{(xi , • • • , x n -i , y) T G JR" \ y = c + C1X1 + c 2 x 2 H + c n _ia; n _i }. (5.3) 



As in (3.3), the double products vanish by a shift of the center to the centroid, implying 

m 

•7(c) > ^2 y i ~y ~ C l( X l^ - Cn-liXnLi - Xn- 

i=l 

with equality if y — co + c\ x\ + ■ ■ ■ + c n _i af n _i. Hence, the centroid is in the hyperplane. However, more than one 
unknown parameter is left and the easy argument of (3.4) cannot be applied directly. On the other hand, the dual 
approach (in "column space") (3.5-3.7) is straightforward and provides the solution easily. Defining vectors x*, and 
y G R m and the matrix A £ JR mxn , 



x fe := 



/ 4 15 \ 


( 


r (2) 






, y := 


V 4 m ) J 


V 



!/2 



\ymj 



/I 



, and A := (e | xi | ■ ■ ■ | x n -i) = 



1 x< 2) 



r (2) 1 



\i 4 



(m) 



Cm) 



i/ 



the functional (5.2) takes the form: 

^( c ) = II y - c oe - 



c n _ix n _i || 2 = ||y - Ac\\l . 



(5.4) 



As in (2.4) and (3.7) it is minimized by the orthogonal projection of y on the span of xi ■ • ■ x n _i and e, i.e. on 
Im(A), 



f minimal 



y - Ac _L Im(A) . 



(5.5) 



As before, if the rank of A is maximal, the solution can be computed from the Normal Equations or better by an 
Orthogonal Factorization, see M , 



A T Ac = A T y or better A = QR and Rc = Q T y . 



(5.6) 



Otherwise we can use the Singular Value Decomposition 



A = UT,V T and c = V E f U T y . 



(5.7) 



The total least squares approximation minimizes the sum of squares of true distances. We do not attribute a 
special position to the y-coordinate and describe the hyperplane in JR n , as in (4.2), by w + r . The functional to 
minimize is: 

m m 

7(r,w):=]T (r T (z l -w)) 2 = ^(r T (z l -z)) 2 +m(r T (z-w)) 2 (5.8) 

i=l i=l 

subject to ||r|| = 1. Since the double products in the second right-hand side cancel, the centroid (again) is in the 
hyperplane and it minimizes (5.5) for all r. We are left with the reduced minimization problem, to find r with 
|| r || 2 = 1 minimizing 



/ 



7(r, 8) = ||flr|| 



with B := 



4 1] 



V (m) 
\X\ - X! 

The solution vector r is the right singular vector of B corresponding to the smallest singular value of B. We conclude: 

a. A solution always exists; it is given by the hyperplane through the centroid and orthogonal to the right singular 
vector belonging to the smallest singular value of matrix B. It is not expressible in the form (5.3) if r n = 0. 

b. The solution is unique, iff a n _i > cr„ . 

7 



351 
~1 



- X n -1 



yi 



(2) 

C^l-L-Xn-1 2/2-2/ 



2A 



„(™) 



(5.9) 



%n—l yn 



2/ 



c. The shift of (5.8) to the centroid z 6 i is the key in finding the solution. 



In the dual approach we again consider the hyperplane (^3|), but now the y-coordinate has no special position in 
the defining equation, 

{(xi , • • • , x n -i , y) T G K" | co + cixi + c 2 x 2 H + c n - 1 x„- 1 + c„y = } ; (5.10) 

instead of c n = —1 we require ^™=i c ? ~ ^ e cnoose (f° r each i) the point (/j*" 1 , • • • , f^-i > 5i) T on this hyperplane 
nearest to the datapoint Zi, (i = 1 • • • m). The first, second, etc. coordinates of these points form in JR m the vectors 
f fc (k = 1 • • • n - 1) and g, 

ffe = (/fe 1 ' ,/!,••-, /fc m) ) T and g = (fli , ff2 , • • ■ , 3m) T , 

which clearly satisfy the relation coe + cifi + ■ ■ ■ + c n _if n _i + c ?l g = . The minimization of the sum of squares 
of distances from the datapoints Zj to the hyperplane can now be reformulated as the problem of finding vectors ffe 
(k — 1 • ■ • n — 1) and g in M m that minimize the functional 

n-l 

II y — g 111 + ^ II Xfe - ffe ||| subject to co e + ci fi H + c n _if n _i + c n g = , (5-11) 



where Ylk=i c k = 1 - As in d 4 - 9 ! ~ I 410 ! ) 

we may restrict this minimization problem to e ± and eliminate the unknown 
Co = —c„y n — J^fe-i c kXk by orthogonalization w.r.t. e; essentially this amounts to the same as the shift to the 
centroid in the primal approach in ]R". So we find the restricted problem of finding vectors ffc (k = 1 • • • n — 1) and 
g that minimize 

n— 1 

|| y - ye - g ||| + 22 || Xk — Site — ffe ||| subject to c\ fi H + c„_if„_i + c„g = . 



Without imposing it, the minimizing vectors are orthogonal to e automatically, as in (1.10). Defining the matrices 
B and E, 

B := (xi - xi e | ■ ■ ■ | x n _i - x n -i e | y - ye) and E := (fi | ■ • ■ | f„_i | g) 
we can reformulate the problem as: 

minimize || B — E \\% subject to rank(E) = n — 1 . (5-12) 



In this form it is easily solved by the SVD. If B = J^ILi °« Ui then i? = ^™=i CTi u ' v ^ ^ s a ininimizer of ( 5.12 ), 
which is unique, if a n -\ > On ■ The coefficients ci , • • ■ , d determining the hyperplane are the coordinates of the 
right singular vector v„ as before: 



Ev n = 0. 



6. General Least Squares. For a given matrix A 6 J? mxn with m > n and right-hand side b 6 -R m we consider 
the problem to find the minimizer c G 7R n of the functional 

Cl 

J(c) := || .Ac — t» ||i with c— I | . (6.1) 



8 



where 



/ Oi, 



a i . 



A := 



G JR" 



and b := 



6 iR m (m > n) . 



\ Om,l 



The difference with (5^) is, that A needs not contain a column consisting of all ones. The solution is obtained by a 



column space argument as in (5.5), namely that J(x) is minimal iff b — Ax. is orthogonal to Im(j4) and it may be 
computed by normal equations, QR- factorization or SVD. 



What is interesting for the TLS generalization is the interpretation of (5.1) in row space. We have introduced 
the TLS approximation in the sections ^ and [B] as the one that minimizes the sum of squares of the true distances of 
m points to a hyperplane, whereas ordinary least squares measures the distances along the j/-axis. We can interpret 
(6.1) in this sense. The rows of the extended matrix (A\ — b) define a cloud of m points in M n+1 , 



Zfc := (a fc ,i 



7 &fc,n ■ 



G M 



n+l 



such that ( zi 



■ ) = (A\ -b) J 



(6.2) 



to which we try to fit a linear function b(xi ■ ■ ■ x n ) = c\x\ + ■ ■ ■ + c n x n . In other words, we look for an n-dimensional 
subspace c ± in M n+1 (and not a hyperplane in IR n as in the regression problem), that is nearest to the datapoints 



(6.2), minimizing 



J(c) 



"A 



fe=i 



(z£S) 



where c 



G R 



n + l 



(6.3) 



In this sum of squares the quantity z k c measures the distance from z k to c; along the n+l-st coordinate axis. 



The Total Least Squares approximation for the cloud of points (5.2) minimizes the sum of squares of true 
distances to the subspace c x . As the true distance from Zk to the subspace is given by z^c/c T c, see (4.3), the 
TLS-approximation minimizes the functional: 



/(c) 



E 

fe=i 



(A|-b)c| 



where 



(6.4) 



The fuctional rn || ( A \ — b ) r|| 2 subject to ||r|| = 1 is minimal, if r is the right singular vector corresponding to the 
smallest singular value of the matrix (A \ — b). Renormalizing the last component to —1, if possible, provides the 
solution to the TLS problem for the overdetermined system of equations Ax — b. If the n+l-st component of this 
right singular vector is zero, no solution exists to the TLS-problem. The solution is unique if a n > cr n +i. 



Interpretation of TLS in Column Space: To each point Zk (& = !••■ m) in the cloud (5.2) 



Zk 



( a k,l \ 



V -b k J 



( h,i \ 



corresponds its best approximation Wfe 



fk,n 



V 



G c 



(6.5) 



/ 



The TLS-approximation minimizes the sum of squares of the distances between the (given) points Zk and the points 
Wfc in the subspace ? x . We can write this sum of squares as the Frobenius norm of a matrix, if we consider the 
components fkj as the elements of a matrix F G JR mxn , and the components gk as the components of a vector 
g G JR m . Hence, TLS minimizes 



IK - w fc || 2 = \\A - F\\% + ||b - gf = \\(A | —b) — (F | - g)\\% 

k=l 

9 



(6.6) 



Since the rows of the matrix E := (F \ — g) 6 jj>mx(n+i) are or thogonal to c, the rank of E is n at most. In 
other words, TLS minimizes 



(A\-h)-E\\%. subject to E e R mx{n+1) and rank(E) < n . 



(6.7) 



We may interpret this as the quest for the solution of the solvable linear system Fc = g "nearest" to the (unsolvable) 
system Ax = b, where "solvable" means: g € Im(F) . 

The minimization problem ( |6.7| ) is solved by the SVD. If ( A | — b ) = y"]"^, 1 atUivf , then E = ^™=i a i u i' v T 
and the required solution of the TLS-problem is the null-vector v n +i of E, i.e. the right singular vector v n +i of 
(A — b) corresponding to the smallest singular value <7 n +i , provided the n+l-st component is non-zero. As stated 
at the end of section ^, the formulation (6.7) takes a roundabout in comparison to the equivalent formulation ( |S.l| ) 
in that it asks for a minimizing system of equations, instead of the solution c itself. 

We conclude, that in general a best approximation of the overdetermined system Ax = b in TLS-sense may 
not exist, because we are not satisfied with the subspace as in a problem of regression; we want the equation for the 
subspace b — c\X\ + ■ ■ ■ + c„x„ to be explicit w.r.t. b. Furthermore, the solution is not necessarily unique. We shall 
illustrate this by two examples. 

Example 1: Consider the cloud of 4 points in IB?: 



(1,1), (-1,1), (1,-1), and (-1,-1) 
The LS-approximation is the horizontal line {(x, y) \ y = 0}. The TLS-approximation makes the SVD of the matrix 



B := 



( 1 


1 ^ 
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As both singular values are equal, there is no unicity; every line through the origin provides a solution, as shown in 
fig. H. The sum of squares of distances from the points to a line with slope tan <j> is independent of the slope. 



(-1.1) • 




(1,1) 


V 

tan if 






n \ 




(-1.-1) • 




(i,-i) 





Fig. 



Exam-pie 1: £ 2 -\-rf — (1 + tan tp) 2 cos 2 ip + (1 — tan (p) 2 cos 2 <p = 2 independent on tp . 



Example 2: Solve the following problem in LS-sense and TLS-sense: 




The normal equations for the LS-approximation are: 



1 




x — 1 and y undetermined . 
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The SVD for TLS-problem is: 



B 
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V o 
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/ y/2 + V2 
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V o 
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The smallest singular value is . However, the 3 rd component of the corresponding right singular vector (0,1, 0) T 
is as well, such that no TLS-solution exists! 

7. Generalizations: (a) Multiple RHS. In ordinary least squares there is no difference between the treatment 
of one and multiple right-hand sides (RHS). In Total Least Squares the column space of the matrix is bent towards 
the RHS. If there are given several RHS's, we can treat each of them separately and compute the SVD of an extended 
matrix for each RHS. In a different approach we can try to bend the matrix to all RHS's collectively. So we consider 
the problem: given A G JR mxn (m > n + p) and B € M mxp find X G H nxp that solves the overdetermined system of 
equations AX = B in TLS-sense. By analogy to (6.6) we have to find the solution X of a solvable matrix equation 
F X — G (i.e. Im(G) C Im(F) ) nearest to AX = B; we have to minimize 



\\A-F\\% + \\B-G\\% subjectto FelR mxn , G e R mxp and F X = G . 
Otherwise stated, find an approximation E = (F \ G) G jR mx ("+p) to (A \ B), such that 

\\ (A \ B) — E W'p is minimal subject to rank(E) = n . 



(7.1) 



(7.2) 



The solution of ( 7J2 ) is constructed by making the SVD of (A | B ) : 



(A | B) = UE V 1 



(mxn 



(mxj>) 

u 2 



( ( " x " 



\ 



E 2 

(p X p) 



(nx») («xp) 
Vl,2 



^2,2 

(p Xp) 



(7.3) 



Theorem. If we assume: 



a. rank(V2,2) = P , 

b. E = diag(a! , ■ ■ ■ , a„ , a n +i , • • • , a„ +p ) with aj > a j+1 and cr„ / cr n+ i 



then the TLS problem (7.2) has the unique solution X — —V\,2 V 2 2 ■ 

Proof: From (7.3) and the assumption o n > o n +\ it follows, that the best rank n approximation^] of (A \ B) in the 
Frobenius norm is given by E, 



E:={U! | U 2 ) 



Ei 




Vi,i 

V 2 A 



Vl,2 
V2 



,2 2 J =f/iEi (V 1 T 1 \V? i2 ) = (F\G), 



(7.4) 



where F := Ui Ei Vi T i and G := Ui Ei Vi T 2 . The orthogonality of the columns of V implies 



Vi.i 
V 2 a 



Vl,2 
V 2 ,2 



(0) and hence E 



Vi.a 

^2,2 



FFi l2 + Gy 2 , 2 = (0). 



Under the assumption rank(V 2>2 ) — p we may conclude, that V := —Vi,2 V 2 2 solves the approximate equation 
FX = G . □ 

(b) Fixed columns: In section |^ we have introduced the simple (bivariate) regression problem and we have shown 
that it is solved in LS-sense by the LS-solution of the overdetermined system of equations A(f\ = (e | x)(n = y 



1 see [01 theorem 2.5.2 
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(cf. 3.6). However, as explained in section q, the TLS-solution of this overdetermined system of equations is derived 
from the SVD of the matrix (e | x | y) G R mx3 . This differs from the TLS-solution of the regression problem, which 



is derived from the SVD of B := (x — xe \ y — ye) G R mx2 , cf. eq. (4.7). The reason for this difference is, that 
the formulation of the regression problem as an overdetermined set of equations A(j^ = y hast lost its geometric 
interpretation as a line y = a + bx in the (x, j/)-plane. In the LS-solution this makes no difference since all uncertainty 
is put in the y-column. However, TLS for A(j^j = y puts uncertainty in all three columns e, x and y, although in 
the regression problem there is no reason to postulate uncertainty in the "constant term" . The TLS-solution of the 
regression problem can be regained from Am = y if we "freeze" the first column of A and put uncertainty in the 



columns x and y only as in eq. (L8). The solution is obtained by orthogonalization w.r.t. the frozen column e. 

This motivates the study of TLS-problem for AX = B with frozen columns, see where uncertainty is 
postulated in a part of the columns of A (LS is a special case, all columns of the matrix being frozen!). So we assume 
that the matrix A is partitioned in a frozen part A\ G R mxi and a part A2 G R mxk containing some uncertainty 
with j + k = n. Given a right-hand side B G R mxp with m > j + k+p , we seek matrices Xi G R jxp and X 2 G R kxp , 
such that 

A 1 X 1 +A 2 X 2 = B in TLS-sense w.r.t. A 2 and B keeping A 1 fixed. (7.5) 
More precise, minimize among all C G R mxk and D G R mxp 

\\A 2 -C\\ F + \\B-D\\% subjectto AiXi + C X 2 = D. (7.6) 
or otherwise said, subject to the condition rank(Ai C | D) — j + k = n. 



Guided by the idea of (4.8), where we orthogonalized w.r.t. the frozen column, we find the 
solution: 

a. Orthogonalize columns of A 2 and B w.r.t. columns of Ai 

b. Solve TLS-problem in the orthogonal complement lm(Ai) ± . 

Proof: If Ai is of full column rank (rank(Ai) = j), we make the QR-factorization 

Ai=u( R *] with U G R mxm orthogonal and Ri G R jxj 



(7.7) 



Because the Frobenius norm is orthogonally invariant, the functional (7.£) is equal to 

|| U T A 2 - U T C\\ 2 F + \\U T B ~U T D\\ 2 F . (7.8) 
Partitioning the matrices in parts consisting of the topmost j rows and the remaining m — j rows respectively, 

^-12 \ rrT A ( Bi \ t f Cl \ j T T r~i I D\ \ T 



a 22 i-= u ' a - U :=u B > U ~ u o > U :=u D > (7 - 9) 



we can rewrite the functional as 



Mia-CiHJ + \\B 1 -D 1 f F + \\A 22 -C 2 f F + \\B 2 -D 2 \\ 2 F . (7.10) 
It has to be minimized subject to the equations Ri X\ + C\ X 2 — D\ and C 2 X 2 = D 2 . If X 2 is known, and if 



we choose A\ 2 = C\ and B\ = D\, the first two terms in (7.10) vanish and Xi can be solved from the equation 
Ri Xi + Ci X 2 — D\ . Hence it suffices to minimize 

\\A 22 -C 2 f F + \\B 2 -D 2 \\ F subjectto C 2 X 2 = D 2 . (7.11) 



This is solved as eq. @ by the SVD of (C 2 | D 2 ). 
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If A is not of full column rank (rank(Ai) = r < j), we use the SVD of Ai - . 



A X = V 



Ei 




v 2 T 



with U G R r ' 



Vi G M> 



V 2 G J? 



With the same partitioning as in ( 7.£ ), but now with the r topmost rows in the upper parts and the remaining m — r 
rows in the lower parts, we arrive at the minimization of (7.10) subject to the conditions 



Ei V{ X x +dX 2 = £>i and C 2 X 2 = D 2 ■ 



(7.12) 



Choosing Ai 2 = C\ and Bi = Di and solving X2 from ( 7.11 ) we can solve Vi Xi from ( 7.12 ). This makes the first 
two terms in (7.1C) zero, such that the problem again is reduced to the form (7.2). As in standard LS-problems 
in which the matrix is not of full column rank, the part X\ is not uniquely defined; we may add to it any linear 
combination of the columns of V 2 . □ 
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